{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training a model in the cloud\n",
    "\n",
    "This notebook contains commands and instructions to train models on Google's Cloud ML.\n",
    "\n",
    "Table of contents:\n",
    "\n",
    "- [1 Project setup](#1-Project-setup)\n",
    "- [2 Training a model](#2-Training-a-model)\n",
    "- [3 Publishing a model](#3-Publishing-a-model)\n",
    "- [A References](#A-References)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1 Project setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.1 Creating a project\n",
    "\n",
    "For [\"Tensorflow Basics\" Workshop (AMLD)](https://www.appliedmldays.org/workshop_sessions/tensorflow-basics.1) participants:\n",
    "\n",
    "1. You should have a properly configured project registered to your account.\n",
    "2. Navigate to https://console.cloud.google.com/cloud-resource-manager and select the project `amld-tf-?` by clicking on it (this select the project when following other links further down).\n",
    "\n",
    "For everybody else:\n",
    "\n",
    "1. Follow https://console.cloud.google.com/cloud-resource-manager and click on \"+ CREATE PROJECT\" button to create a new project and follow instructions.\n",
    "2. [Enable Billing](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project) for the new project. If you register a payment method for the first time, you will get a free credit of 300 USD. Note that you might get a small amount (~$2) transacted from your registered credit card, but this transaction will immediately be reversed.\n",
    "3. [Enable Cloud ML API](https://console.cloud.google.com/apis/library) for the new project: search for \"ML\", click on API card and then on \"enable\" button.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.2 Setting up variables\n",
    "\n",
    "Open https://console.cloud.google.com/cloudshell - All the following commands\n",
    "(in `codefont`) need to be entered in the cloudshell. Note that you need to\n",
    "redefine variables if you open a new tab.\n",
    "\n",
    "Set an environment variable to match your project ID (the \"project=\" parameter in the URL from above step):\n",
    "\n",
    "`PROJECT_ID=<YOUR PROJECT ID>`\n",
    "\n",
    "Define a couple more variables and set default project:\n",
    "\n",
    "`\n",
    "MODELS_BUCKET=\"gs://${PROJECT_ID}-models\"\n",
    "LOCATION=europe-west1\n",
    "gcloud config set project ${PROJECT_ID}\n",
    "`\n",
    "\n",
    "Don't forget to enter these commands in every Cloudshell instance (if you are using multiple tabs)!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.3 Initializing the project\n",
    "\n",
    "Create the storage bucket where our models will be stored (view them with https://console.cloud.google.com/storage/browser):\n",
    "\n",
    "`gsutil mb -l ${LOCATION} ${MODELS_BUCKET}`\n",
    "\n",
    "Download this repository and navigate to `cloud` directory:\n",
    "\n",
    "`git clone https://github.com/tensorflow/workshops.git && cd workshops/extras/amld/cloud`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2 Training a model\n",
    "\n",
    "To train a new model you need to issue one long command that references\n",
    "a python module (in this case `./quickdraw_rnn/task.py`) containing a\n",
    "`tf.contrib.learn.Experiment`\n",
    "([source](https://github.com/tensorflow/workshops/tree/master/extras/amld/cloud/quickdraw_rnn)).\n",
    "The Cloud ML infrastructure will then take care of running parameter servers,\n",
    "masters and workers for distributed training of the model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1 Start a training job\n",
    "\n",
    "The following commands starts a job on Cloud ML and sets a couple of\n",
    "[configuration parameters](https://cloud.google.com/ml-engine/docs/training-overview#job_configuration_parameters):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "DATASET='zoo_img'\n",
      "MODEL='quickdraw_cnn'\n",
      "INFO='2k'\n",
      "JOB_NAME=\"${MODEL}_${DATASET%_*}_${INFO}_$(date +%Y%m%d_%H%M%S)\"\n",
      "JOB_DIR=\"${MODELS_BUCKET}/${JOB_NAME}\"\n",
      "\n",
      "DATA=\"gs://amld-tf-data/${DATASET}\"\n",
      "gcloud ml-engine jobs submit training \"${JOB_NAME}\" \\\n",
      "    --package-path \"${MODEL}\" \\\n",
      "    --module-name \"${MODEL}.task\" \\\n",
      "    --staging-bucket \"${MODELS_BUCKET}\" \\\n",
      "    --job-dir \"${JOB_DIR}\" \\\n",
      "    --runtime-version 1.4\\\n",
      "    --region ${LOCATION} \\\n",
      "    --config config/config.yaml \\\n",
      "    -- \\\n",
      "    --data_dir \"$DATA\" \\\n",
      "    --output_dir \"$JOB_DIR\" \\\n",
      "    --train_steps 2000\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "\n",
    "print r'''\n",
    "\n",
    "DATASET='zoo_img'\n",
    "MODEL='quickdraw_cnn'\n",
    "INFO='2k'\n",
    "JOB_NAME=\"${MODEL}_${DATASET%_*}_${INFO}_$(date +%Y%m%d_%H%M%S)\"\n",
    "JOB_DIR=\"${MODELS_BUCKET}/${JOB_NAME}\"\n",
    "\n",
    "DATA=\"gs://amld-tf-data/${DATASET}\"\n",
    "gcloud ml-engine jobs submit training \"${JOB_NAME}\" \\\n",
    "    --package-path \"${MODEL}\" \\\n",
    "    --module-name \"${MODEL}.task\" \\\n",
    "    --staging-bucket \"${MODELS_BUCKET}\" \\\n",
    "    --job-dir \"${JOB_DIR}\" \\\n",
    "    --runtime-version 1.4\\\n",
    "    --region ${LOCATION} \\\n",
    "    --config config/config.yaml \\\n",
    "    -- \\\n",
    "    --data_dir \"$DATA\" \\\n",
    "    --output_dir \"$JOB_DIR\" \\\n",
    "    --train_steps 2000\n",
    "\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 Monitor training jobs\n",
    "\n",
    "You can see your current training jobs with\n",
    "\n",
    "`gcloud ml-engine jobs list`\n",
    "\n",
    "or by visiting https://console.cloud.google.com/mlengine/jobs –\n",
    "It will usually take a couple of minutes to setup the jobs on Cloud\n",
    "ML before they appear on the web UI.\n",
    "\n",
    "You can also visualize the training/eval stats of your models using\n",
    "Tensorboard (if Tensorboard doesn't update your training stats, you\n",
    "might have to restart the program...):\n",
    "\n",
    "`tensorboard --port 8080 --logdir \"${MODELS_BUCKET}\"`\n",
    "\n",
    "You can then open Tensorboard in your browser by clicking on the\n",
    "top right browser icon in the header of the cloud shell and select\n",
    "\"Preview on port 8080\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3 Publishing a model\n",
    "\n",
    "The trained model will also be \"exported\" into the `export/Servo`\n",
    "directory. This exported model (in a subdirectory named after the\n",
    "seconds since epoch when the model was exported) will contain all\n",
    "necessary information, including graph, variables, and \"signature\"\n",
    "that defines input and output tensor shapes.\n",
    "\n",
    "\"Publishing\" a model basically means copying one of these exported\n",
    "models for prediction and giving it a label.\n",
    "\n",
    "You can do this either via the\n",
    "[web interface (ML Engine/Models)](https://console.cloud.google.com/mlengine/models)\n",
    "or by issuing the following Cloudshell command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "MODEL_NAME=\"${MODEL}_${DATASET%_*}\"\n",
      "VERSION_NAME=v1\n",
      "\n",
      "gcloud ml-engine models create --regions ${LOCATION} ${MODEL_NAME}\n",
      "ORIGIN=$(gsutil ls \"${JOB_DIR}\"/export/Servo | tail -1)\n",
      "gcloud ml-engine versions create \\\n",
      "    --origin ${ORIGIN} \\\n",
      "    --model ${MODEL_NAME} \\\n",
      "    ${VERSION_NAME}\n",
      "gcloud ml-engine versions set-default --model ${MODEL_NAME} ${VERSION_NAME}\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print r'''\n",
    "\n",
    "MODEL_NAME=\"${MODEL}_${DATASET%_*}\"\n",
    "VERSION_NAME=v1\n",
    "\n",
    "gcloud ml-engine models create --regions ${LOCATION} ${MODEL_NAME}\n",
    "ORIGIN=$(gsutil ls \"${JOB_DIR}\"/export/Servo | tail -1)\n",
    "gcloud ml-engine versions create \\\n",
    "    --origin ${ORIGIN} \\\n",
    "    --model ${MODEL_NAME} \\\n",
    "    ${VERSION_NAME}\n",
    "gcloud ml-engine versions set-default --model ${MODEL_NAME} ${VERSION_NAME}\n",
    "\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can then do online predictions – see https://cloud.google.com/ml-engine/docs/online-predict."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A References\n",
    "\n",
    "- https://cloud.google.com/ml-engine/docs/distributed-tensorflow-mnist-cloud-datalab – Describes same approach used in this notebook.\n",
    "- https://cloud.google.com/solutions/running-distributed-tensorflow-on-compute-engine – Describes how to run distributed Tensorflow in a virtual machine.\n",
    "- https://www.tensorflow.org/deploy/distributed – Learn more about how Tensorflow distributes training on multiple machines."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
